Profile callprofiler with different testcases by ThreeMonth03 · Pull Request #829 · solvcon/modmesh

ThreeMonth03 · 2026-05-25T17:55:19Z

Introduction

Issue #831 implies that a slow callprofiler might make profiling error bigger. To monitor whether callprofiler works fast in every cases, this pull request profilers callprofiler with different testcases by gprof. Additionally, this pull request shows what and why profiler works slowly under some cases.

Analysis

There are 4 functions and 4 kinds of hyperparameters, totally 16 combinations in benchmark. As for the detail of methodology and profiling result, please go over in appendix section.

The following sheet is the consuming time of top 5 combinations:

Rank	Benchmark / Function pattern	Operations	Repeats	Workload seconds
1	`wide_siblings`	50000	1	6.167120 s
2	`wide_siblings`	1000	1000	2.107130 s
3	`wide_siblings`	10000	5	0.807280 s
4	`balanced_tree`	1000	1000	0.267998 s
5	`deep_chain`	1000	1000	0.253346 s

We notice that top 3 slowest functions are wide_siblings(), which looks like:

void wide_siblings()
{
     profile_empty_function1();
     profile_empty_function2();
     // ...
     profile_empty_function{Operations}();
}

It means that callprofiler works slow when profiling a function, which has a large number of direct callees in the same call depth. Especially the top 1 case, it takes 6 second for callprofiler to profile 50000 empty direct callees.

Profiler works slowly in these cases, since the data structure of profiler is trie, and it takes linear time to find target children for a node. The data structure of children list is std::list.

modmesh/cpp/modmesh/toggle/RadixTree.hpp

Lines 90 to 95 in 5a7beb6

    
           RadixTreeNode<T> * get_child(key_type key) const 
        
           { 
        
               auto it = std::find_if(m_children.begin(), m_children.end(), [&](const auto & child) 
        
                                      { return child->key() == key; }); 
        
               return (it != m_children.end()) ? it->get() : nullptr; 
        
           }

modmesh/cpp/modmesh/toggle/RadixTree.hpp

Line 52 in 27a0019

using child_list_type = std::list<std::unique_ptr<RadixTreeNode<T>>>;

Every node in trie corresponds to a registered function, and there is a pointer pointing to a node which function is executing. When a registered function calls a registered callee, the profiler would find the corresponding child nodes at first, then move the pointer to the child node. If a registered function returns, the pointer would be moved to its parent.

Therefore, if there are n registered callees, it takes quardratic times to search n children nodes. For example, there are n = 50000 registered callees in the first case. It takes some time to find target children nodes of n callees, since O(n^2) ~= O(50000^2).

In conclusion, if callprofiler needs profiling a function with a lot of direct callees in real life, the search capability of CallProfiler::start_caller() should be optimized furthermore.

Appendix

Benchmark

To simulate the requirements in real life, I design 4 types of functions in benchmarks. In every functions, there are some operations, which are empty functions with different names.

Function types

wide_siblings: All subfunctions are at the same layer.

#define PROFILE_EMPTY(ID)                 \
    void profile_empty_##ID()             \
    {                                     \
        MODMESH_PROFILE_FUNCTION();       \
    }

PROFILE_EMPTY(000000)
PROFILE_EMPTY(000001)
PROFILE_EMPTY(000002)
PROFILE_EMPTY(000003)

void run_wide_siblings()
{
    MODMESH_PROFILE_FUNCTION();

    profile_empty_000000();
    profile_empty_000001();
    profile_empty_000002();
    profile_empty_000003();
}

deep_chain: All subfunctions are at the different layers.

void profile_empty_000003()
{
    MODMESH_PROFILE_FUNCTION();
}

void profile_empty_000002()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000003();
}

void profile_empty_000001()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000002();
}

void profile_empty_000000()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000001();
}

void run_deep_chain()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000000();
}

balanced_tree: The dependency of subfunctions looks like a balanced tree.

void profile_empty_000000() { MODMESH_PROFILE_FUNCTION(); }
void profile_empty_000002() { MODMESH_PROFILE_FUNCTION(); }
void profile_empty_000004() { MODMESH_PROFILE_FUNCTION(); }
void profile_empty_000006() { MODMESH_PROFILE_FUNCTION(); }

void profile_empty_000001()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000000();
    profile_empty_000002();
}

void profile_empty_000005()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000004();
    profile_empty_000006();
}

void profile_empty_000003()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000001();
    profile_empty_000005();
}

void run_balanced_tree()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000003();
}

hot_name_reuse: All subfunctions are at the same layer, but reuse the same name for 100 consecutive subfunctions, then switch to the next name.

#define PROFILE_EMPTY(ID)                 \
    void profile_empty_##ID()             \
    {                                     \
        MODMESH_PROFILE_FUNCTION();       \
    }

PROFILE_EMPTY(000000)
PROFILE_EMPTY(000001)
PROFILE_EMPTY(000002)
PROFILE_EMPTY(000003)

void run_hot_name_reuse()
{
    MODMESH_PROFILE_FUNCTION();

    for (std::size_t repeat = 0; repeat < 100; ++repeat)
    {
        profile_empty_000000();
        profile_empty_000001();
        profile_empty_000002();
        profile_empty_000003();
    }
}

By the way, this pull request sweeps the number of operations in every functions. The number of operations might be 100, 1000, 10000, or 50000. To obtain a precise profiling result, this pull request also repeats and resets the profiler when the number of operations is small, because gprof is sampling-based profiler.

The following data are measured on WSL2, with intel 13700 CPU.
Since gprof is integrated with g++, scripts in this pull request are only supported for linux platform now.

wide_siblings

operations	repeats	workload seconds
100	10000	2.293520E-01
1000	1000	2.107130E+00
10000	5	8.072800E-01
50000	1	6.167120E+00

gprof top 5: operations `100`, repeats `10000`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 55.56      0.05     0.05  1010000     0.05     0.05  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
 22.22      0.07     0.02    59617     0.34     0.34  modmesh::profiling::detail::profile_empty_049983(unsigned long, unsigned long)
 11.11      0.08     0.01  1000000     0.01     0.06  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)
 11.11      0.09     0.01                             modmesh::profiling::detail::profile_empty_000090(unsigned long, unsigned long)
  0.00      0.09     0.00  3030000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)

gprof top 5: operations `1000`, repeats `1000`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 92.43      1.71     1.71  1001000     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  2.16      1.75     0.04  1001000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  1.08      1.77     0.02  1001000     0.00     0.00  modmesh::CallProfiler::end_caller()
  1.08      1.79     0.02     8727     0.00     0.00  modmesh::profiling::detail::profile_empty_049983(unsigned long, unsigned long)
  0.54      1.83     0.01                             modmesh::profiling::detail::profile_empty_000495(unsigned long, unsigned long)

gprof top 5: operations `10000`, repeats `5`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 96.00      0.72     0.72    50005     0.01     0.01  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  1.33      0.73     0.01   150015     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  1.33      0.74     0.01    50005     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  1.33      0.75     0.01                             modmesh::profiling::detail::profile_empty_003445(unsigned long, unsigned long)
  0.00      0.75     0.00    50005     0.00     0.00  modmesh::CallProfiler::end_caller()

gprof top 5: operations `50000`, repeats `1`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 98.69      6.03     6.03    50001     0.12     0.12  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.33      6.05     0.02    50001     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.16      6.06     0.01    50000     0.00     0.12  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)
  0.16      6.07     0.01        1    10.00    10.00  void std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_assign<std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > >(std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > const&)
  0.16      6.08     0.01                             modmesh::profiling::detail::profile_empty_011490(unsigned long, unsigned long)

deep_chain

operations	repeats	workload seconds
100	10000	1.706320E-01
1000	1000	2.533460E-01
10000	5	3.774810E-02
50000	1	6.268040E-02

gprof top 5: operations `100`, repeats `10000`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 33.33      0.01     0.01  1010000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
 33.33      0.02     0.01  1000000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)
 33.33      0.03     0.01    10000     0.00     0.00  modmesh::RadixTreeNode<modmesh::CallerProfile>::~RadixTreeNode()
  0.00      0.03     0.00  3030000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.03     0.00  1010000     0.00     0.00  modmesh::CallProfiler::end_caller()

gprof top 5: operations `1000`, repeats `1000`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 33.33      0.01     0.01  1001000     0.01     0.01  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
 33.33      0.02     0.01  1001000     0.01     0.01  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
 33.33      0.03     0.01     1000    10.00    10.00  void std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_assign<std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > >(std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > const&)
  0.00      0.03     0.00  3003000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.03     0.00  1001000     0.00     0.02  modmesh::CallProfiler::end_caller()

gprof top 5: operations `10000`, repeats `5`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00   150015     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.00     0.00    50005     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.00     0.00    50005     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.00     0.00    50005     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.00      0.00     0.00    50000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations `50000`, repeats `1`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 25.00      0.01     0.01    50001     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
 25.00      0.02     0.01       13     0.77     0.77  std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_rehash(unsigned long, unsigned long const&)
 25.00      0.03     0.01        1    10.00    10.00  void std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_assign<std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > >(std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > const&)
 25.00      0.04     0.01                             modmesh::profiling::detail::profile_empty_043209(unsigned long, unsigned long)
  0.00      0.04     0.00   150003     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)

balanced_tree

operations	repeats	workload seconds
100	10000	1.841660E-01
1000	1000	2.679980E-01
10000	5	3.871620E-02
50000	1	5.947280E-02

gprof top 5: operations `100`, repeats `10000`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 60.00      0.03     0.03  1010000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
 20.00      0.04     0.01  1010000     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.05     0.00  3030000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.05     0.00  1010000     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.05     0.00  1000000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations `1000`, repeats `1000`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 83.33      0.05     0.05  1001000     0.05     0.05  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
 16.67      0.06     0.01  1001000     0.01     0.01  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.06     0.00  3003000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.06     0.00  1001000     0.00     0.05  modmesh::CallProfiler::end_caller()
  0.00      0.06     0.00  1000000     0.00     0.01  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations `10000`, repeats `5`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
100.00      0.01     0.01                             modmesh::profiling::detail::profile_empty_000026(unsigned long, unsigned long)
  0.00      0.01     0.00   150015     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.01     0.00    50005     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.01     0.00    50005     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.01     0.00    50005     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)

gprof top 5: operations `50000`, repeats `1`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
100.00      0.01     0.01        2     5.00     5.00  modmesh::RadixTreeNode<modmesh::CallerProfile>::~RadixTreeNode()
  0.00      0.01     0.00   150003     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.01     0.00    50001     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.01     0.00    50001     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.01     0.00    50001     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)

hot_name_reuse

operations	repeats	workload seconds
100	10000	8.356190E-02
1000	1000	8.568760E-02
10000	5	7.355400E-03
50000	1	2.580690E-02

gprof top 5: operations `100`, repeats `10000`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00  3030000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.00     0.00  1010000     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.00     0.00  1010000     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.00     0.00  1010000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.00      0.00     0.00  1000000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations `1000`, repeats `1000`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 50.00      0.01     0.01  1001000     0.01     0.01  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
 50.00      0.02     0.01  1000000     0.01     0.02  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)
  0.00      0.02     0.00  3003000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.02     0.00  1001000     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.02     0.00  1001000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)

gprof top 5: operations `10000`, repeats `5`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00   150015     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.00     0.00    50005     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.00     0.00    50005     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.00     0.00    50005     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.00      0.00     0.00    50000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations `50000`, repeats `1`

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ns/call  ns/call  name    
100.00      0.01     0.01    50001   200.00   200.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.01     0.00   150003     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.01     0.00    50001     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.01     0.00    50001     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.00      0.01     0.00    50000     0.00   200.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

ThreeMonth03

@yungyuc Please take a look. This pull request is quite long, so I'm wondering whether there are better ways to generate code and profile the benchmark on different platform.

ThreeMonth03 · 2026-05-26T07:05:50Z

+      - name: make cprof
+        if: runner.os == 'Linux'
+        run: |
+          make cprof
+


Profile profiler only on linux.

ThreeMonth03 · 2026-05-26T07:07:40Z

+    return nullptr;
+}
+
+bool run_named_case(std::string_view label, std::size_t size, std::size_t repeat_count)


Run different types of functions with different hyperparameter.

ThreeMonth03 · 2026-05-26T07:08:56Z

+    std::cout << "RESULT workload=" << label
+              << " operations=" << operation_count
+              << " repeats=" << repeat_count
+              << " workload_seconds=" << elapsed.count()
+              << '\n';


This file will print the wall time of benchmark, because we cannot obtain wall time from gprof

ThreeMonth03 · 2026-05-26T07:11:06Z

+void configure_large_stack()
+{
+#if defined(__linux__)
+    rlimit limit{};
+    if (getrlimit(RLIMIT_STACK, &limit) == 0)
+    {
+        if (RLIM_INFINITY == limit.rlim_max || limit.rlim_cur < limit.rlim_max)
+        {
+            limit.rlim_cur = limit.rlim_max;
+            static_cast<void>(setrlimit(RLIMIT_STACK, &limit));
+        }
+    }
+#endif
+}


Configure enough stack size at first, because the depth of callers may be 50000.

ThreeMonth03 · 2026-05-26T07:11:53Z

+std::array<case_definition, 4> const case_definitions{{
+    {"wide_siblings", &workload::run_wide_siblings},
+    {"deep_chain", &workload::run_deep_chain},
+    {"balanced_tree", &workload::run_balanced_tree},
+    {"hot_name_reuse", &workload::run_hot_name_reuse},
+}};


4 kinds of benchmark. They are generated by python scripts.

ThreeMonth03 · 2026-05-26T07:13:23Z

+add_custom_command(
+    OUTPUT ${CPROF_GENERATED_SOURCES}
+    COMMAND "${PYTHON_EXECUTABLE}" "${CPROF_GENERATOR}"
+        --output-dir "${CPROF_GENERATED_DIR}"
+        --shards "${CPROF_SHARD_COUNT}"
+    DEPENDS "${CPROF_GENERATOR}"
+    VERBATIM
+)


Generate benchmarks.

ThreeMonth03 · 2026-05-26T07:14:30Z

I've tried to generate cpp files with macro, but it is too slow to generate 50000 functions.

ThreeMonth03 · 2026-05-26T07:17:03Z

+
+
+if __name__ == "__main__":
+    main()


Scripts to run the execute file of profiling/cprof/callprofiler_gprof.cpp.

ThreeMonth03 · 2026-05-26T07:19:22Z

I'm not sure whether to put cpp files in /profiling.

yungyuc

Please describe what you wanted to see and summarize what you found in each of the 4 profiling cases in the opening comment. It is not obvious what you want to say with the plain profiling data.

To profile the CallProfiler is a good idea.

ThreeMonth03

@yungyuc Please take a look. I think I've pointed out why we need to optimize some hotspots now.

ThreeMonth03 marked this pull request as draft May 25, 2026 17:55

ThreeMonth03 force-pushed the profile-callprofiler-cases branch 2 times, most recently from 22b55de to 65972c4 Compare May 26, 2026 06:03

ThreeMonth03 marked this pull request as ready for review May 26, 2026 07:05

ThreeMonth03 commented May 26, 2026

View reviewed changes

Profile callprofiler with different testcases

45dc6fb

ThreeMonth03 force-pushed the profile-callprofiler-cases branch from 65972c4 to 45dc6fb Compare May 26, 2026 08:07

yungyuc requested changes May 26, 2026

View reviewed changes

yungyuc assigned ThreeMonth03 May 26, 2026

yungyuc added the performance Profiling, runtime, and memory consumption label May 26, 2026

ThreeMonth03 requested a review from yungyuc May 27, 2026 18:02

ThreeMonth03 commented May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile callprofiler with different testcases#829

Profile callprofiler with different testcases#829
ThreeMonth03 wants to merge 1 commit into
solvcon:masterfrom
ThreeMonth03:profile-callprofiler-cases

ThreeMonth03 commented May 25, 2026 •

edited

Loading

Uh oh!

ThreeMonth03 left a comment

Uh oh!

ThreeMonth03 May 26, 2026

Uh oh!

ThreeMonth03 May 26, 2026

Uh oh!

ThreeMonth03 May 26, 2026

Uh oh!

ThreeMonth03 May 26, 2026

Uh oh!

ThreeMonth03 May 26, 2026

Uh oh!

ThreeMonth03 May 26, 2026

Uh oh!

ThreeMonth03 May 26, 2026

Uh oh!

ThreeMonth03 May 26, 2026

Uh oh!

ThreeMonth03 May 26, 2026

Uh oh!

yungyuc left a comment

Uh oh!

ThreeMonth03 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	RadixTreeNode<T> * get_child(key_type key) const
	{
	auto it = std::find_if(m_children.begin(), m_children.end(), [&](const auto & child)
	{ return child->key() == key; });
	return (it != m_children.end()) ? it->get() : nullptr;
	}

Conversation

ThreeMonth03 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introduction

Analysis

Appendix

Benchmark

Function types

wide_siblings

gprof top 5: operations 100, repeats 10000

gprof top 5: operations 1000, repeats 1000

gprof top 5: operations 10000, repeats 5

gprof top 5: operations 50000, repeats 1

deep_chain

gprof top 5: operations 100, repeats 10000

gprof top 5: operations 1000, repeats 1000

gprof top 5: operations 10000, repeats 5

gprof top 5: operations 50000, repeats 1

balanced_tree

gprof top 5: operations 100, repeats 10000

gprof top 5: operations 1000, repeats 1000

gprof top 5: operations 10000, repeats 5

gprof top 5: operations 50000, repeats 1

hot_name_reuse

gprof top 5: operations 100, repeats 10000

gprof top 5: operations 1000, repeats 1000

gprof top 5: operations 10000, repeats 5

gprof top 5: operations 50000, repeats 1

Uh oh!

ThreeMonth03 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yungyuc left a comment

Choose a reason for hiding this comment

Uh oh!

ThreeMonth03 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ThreeMonth03 commented May 25, 2026 •

edited

Loading

gprof top 5: operations `100`, repeats `10000`

gprof top 5: operations `1000`, repeats `1000`

gprof top 5: operations `10000`, repeats `5`

gprof top 5: operations `50000`, repeats `1`

gprof top 5: operations `100`, repeats `10000`

gprof top 5: operations `1000`, repeats `1000`

gprof top 5: operations `10000`, repeats `5`

gprof top 5: operations `50000`, repeats `1`

gprof top 5: operations `100`, repeats `10000`

gprof top 5: operations `1000`, repeats `1000`

gprof top 5: operations `10000`, repeats `5`

gprof top 5: operations `50000`, repeats `1`

gprof top 5: operations `100`, repeats `10000`

gprof top 5: operations `1000`, repeats `1000`

gprof top 5: operations `10000`, repeats `5`

gprof top 5: operations `50000`, repeats `1`